Search CORE

159 research outputs found

Parallel netCDF: A Scientific High-Performance I/O Interface

Author: Choudhary Alok
Gropp William
Latham Rob
Li Jianwei
Liao Wei-keng
Ross Robert
Thakur Rajeev
Publication venue
Publication date: 01/01/2003
Field of study

Dataset storage, exchange, and access play a critical role in scientific applications. For such purposes netCDF serves as a portable and efficient file format and programming interface, which is popular in numerous scientific application domains. However, the original interface does not provide an efficient mechanism for parallel data storage and access. In this work, we present a new parallel interface for writing and reading netCDF datasets. This interface is derived with minimum changes from the serial netCDF interface but defines semantics for parallel access and is tailored for high performance. The underlying parallel I/O is achieved through MPI-IO, allowing for dramatic performance gains through the use of collective I/O optimizations. We compare the implementation strategies with HDF5 and analyze both. Our tests indicate programming convenience and significant I/O performance improvement with this parallel netCDF interface.Comment: 10 pages,7 figure

arXiv.org e-Print Archive

CiteSeerX

Parallel Implementation of Lossy Data Compression for Temporal Data Sets

Author: Agrawal Ankit
Choudhary Alok
Federrath Christoph
Hendrix William
Liao Wei-keng
Son Seung Woo
Yuan Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/03/2017
Field of study

Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Temporal datasets can be very large and cost a huge amount of time to transfer among storage locations. Using data compression techniques, files can be transferred faster and save storage space. NUMARCK is a lossy data compression algorithm for temporal data sets that can learn emerging distributions of element-wise change ratios along the temporal dimension and encodes them into an index table to be concisely represented. This paper presents a parallel implementation of NUMARCK. Evaluated with six data sets obtained from climate and astrophysics simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when running 12800 MPI processes on a parallel computer. We also compare the compression ratios against two lossy data compression algorithms, ISABELA and ZFP. The results show that NUMARCK achieved higher compression ratio than ISABELA and ZFP.Comment: 10 pages, HiPC 201

arXiv.org e-Print Archive

Crossref

A Java Graphical User Interface for Large-Scale Scientific Computations in Distributed Systems

Author: Choudhary Alok
Liao Wei-keng
Shen X
Singh A
Thiruvathukal George K.
Publication venue: Loyola eCommons
Publication date: 01/01/2000
Field of study

Large-scale scientific applications present great challenges to computational scientists in terms of obtaining high performance and in managing large datasets. These applications (most of which are simulations) may employ multiple techniques and resources in a heterogeneously distributed environment. Effective working in such an environment is crucial for modern large-scale simulations. In this paper, we present an integrated Java graphical user interface (IJ-GUI) that provides a control platform for managing complex programs and their large datasets easily. As far as performance is concerned, we present and evaluate our initial implementation of two optimization schemes: data replication and data prediction. Data replication can take advantage of \u27temporal locality\u27 by caching the remote datasets on local disks; data prediction, on the other hand, provides prefetch hints based on the datasets\u27 past activities that are kept in databases. We first introduce the data contiguity concept in such an environment that guides data prediction. The relationship between the two approaches is discussed

Crossref

Loyola eCommons

Environment Diversification with Multi-head Neural Network for Invariant Learning

Author: Huang Bo-Wei
Kao Chang-Sheng
Liao Keng-Te
Lin Shou-De
Publication venue
Publication date: 17/08/2023
Field of study

Neural networks are often trained with empirical risk minimization; however, it has been shown that a shift between training and testing distributions can cause unpredictable performance degradation. On this issue, a research direction, invariant learning, has been proposed to extract invariant features insensitive to the distributional changes. This work proposes EDNIL, an invariant learning framework containing a multi-head neural network to absorb data biases. We show that this framework does not require prior knowledge about environments or strong assumptions about the pre-trained model. We also reveal that the proposed algorithm has theoretical connections to recent studies discussing properties of variant and invariant features. Finally, we demonstrate that models trained with EDNIL are empirically more robust against distributional shifts.Comment: In Proceedings of 36th Conference on Neural Information Processing Systems (NeurIPS 2022

arXiv.org e-Print Archive

Design, implementation, and evaluation of parallell pipelined STAP on parallel computers

Author: Choudhary Alok
Liao Wei-keng
Linderman Mark
Linderman Richard
Varshney Pramod
Weiner Donald
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1998
Field of study

Performance results are presented for the design and implementation of parallel pipelined space-time adaptive processing (STAP) algorithms on parallel computers. In particular, the issues involved in parallelization, our approach to parallelization, and performance results on an Intel Paragon are described. The process of developing software for such an application on parallel computers when latency and throughput are both considered together is discussed and tradeoffs considered with respect to inter and intratask communication and data redistribution are presented. The results show that not only scalable performance was achieved for individual component tasks of STAP but linear speedups were obtained for the integrated task performance, both for latency as well as throughput. Results are presented for up to 236 compute nodes (limited by the machine size available to us). Another interesting observation made from the implementation results is that performance improvement due to the assignment of additional processors to one task can improve the performance of other tasks without any increase in the number of processors assigned to them. Normally, this cannot be predicted by theoretical analysis

Syracuse University Research Facility and Collaborative Environment